System and method for multi-agent event detection and recognition

ABSTRACT

A method and system for creating a histogram of oriented occurrences (HO2) is disclosed. A plurality of entities in at least one image are detected and tracked. One of the plurality of entities is designated as a reference entity. A local 2-dimensional ground plane coordinate system centered on and oriented with respect to the reference entity is defined. The 2-dimensional ground plane is partitioned into a plurality of non-overlapping bins, the bins forming a histogram, a bin tracking a number of occurrences of an entity class. An occurrence of at least one other entity of the plurality of entities located in the at least one image may be associated with one of the plurality of non-overlapping bins. A number of occurrences of entities of at least one entity class in at least one bin may be into a vector to define an HO2 feature.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of co-pending U.S. non-provisionalpatent application Ser. No. 12/489,667 filed Jun. 23, 2009 which claimsthe benefit of U.S. provisional patent application No. 61/074,775 filedJun. 23, 2008, the disclosures of which are incorporated herein byreference in their entirety.

GOVERNMENT RIGHTS IN THIS INVENTION

This invention was made with U.S. government support under contractnumber NBCH-C-07-0062. The U.S. government has certain rights in thisinvention.

FIELD OF THE INVENTION

The invention relates generally to multi-agent event detection andrecognition system. More specifically, the invention relates to a systemand method for providing a computational framework for automateddetection and recognition of events in video images.

BACKGROUND OF THE INVENTION

Automated detection and recognition of events in video is desirable forvideo surveillance, video search, automated performance evaluation andadvanced after-action-review (AAR) for training, and autonomousrobotics. A class of events known as “multi-agent” events involvesinteractions among multiple entities (e.g., people and vehicles) overspace and time. Multi-agent events may generally be inferred from theparticipating object types, object tracks and inter-object relationshipsobserved within the context of the environment. Some simple example ofmulti-agent events include vehicles traveling as a convoy, peoplewalking together as a group, meetings in a parking lot, etc. Althoughthese simple events demonstrate the concept multi-agent events, it isdesirable for a system to recognize more complex multi-agent events.Examples of more complex multi-agent events include thearrival/departure of a VIP with security detail, loading and unloadingwith guards or in the presence of unrelated people, meetings led by afew individuals, and coordinated team actions such as sports plays andmilitary exercises.

Recent work has addressed modeling, analysis and recognition of complexevents. The descriptors used in such approaches include object states,such as start, stop, move or turn, interactions among entities that aretypically instantaneous, and pair-wise measures such as relativedistance or relative speed between an entity and a reference point or areference entity. The pair-wise measurements sometimes are quantizedinto Boolean functions such as Approaches, Meets/Collides, etc.

Unfortunately, such approaches in the prior art rely on a pre-definedontology of an event and fixed numbers and types of objects thatparticipate in an event. Moreover, although pair-wise measurements areeffective for events involving a small number of entities, they areinadequate and inefficient for representing and analyzing complexconfigurations of multiple entities that interact with each othersimultaneously. For example, with the same relative distance and thesame relative speed, two people can walk together or one follows theother, which indicates different relationships among the two people.

Accordingly, what would be desirable, but has not yet been provided, isa system and method for effectively and automatically capturing complexinteractions amongst multiple entities including context over space andtime.

SUMMARY OF THE INVENTION

The above-described problems are addressed and a technical solutionachieved in the art by providing a method and system for creating ahistogram of oriented occurrences (HO2), the method being executed by atleast one processor, comprising the steps of: (a) detecting and trackinga plurality of entities in at least one image; (b) designating one ofthe plurality of entities as a reference entity; (c) defining a local2-dimensional ground plane coordinate system centered on and orientedwith respect to the reference entity; and (d) partitioning the2-dimensional ground plane into a plurality of non-overlapping bins, thebins forming a histogram, a bin tracking a number of occurrences of anentity class. According to an embodiment of the present invention, themethod may further comprise the step of associating an occurrence of atleast one other entity of the plurality of entities located in the atleast one image with one of the plurality of non-overlapping bins. Eachof the plurality of entities may be one of an object and the location ofan object in the at least one image. Steps (a)-(d) may be computed overone of a time instance and a time interval. The partitioned2-dimensional ground plane may move when the reference entity moves.Step (d) may further comprise the step of employing a parts-basedpartition, wherein the parts-based partition measures a distance to thereference entity as the shortest distance to a point on the boundary ofthe reference entity and the bins are defined based on important partsof the reference entity.

According to an embodiment of the present invention, the method mayfurther comprise the step of loading a number of occurrences of entitiesof at least one entity class in at least one bin into a vector to definean HO2 feature.

According to an embodiment of the present invention, the method mayfurther comprise the steps of: geo-referencing the detected entities andtheir tracks to the 2-dimensional ground plane; annotating each of thedetected entities as one of a positive reference entity and a negativereference entity; computing an HO2 feature for each annotated entity,the annotated entity being used as a reference entity; selecting HO2features with positive reference entities as positive training samplesand selecting HO2 features with negative reference entities as negativetraining samples; classifying event samples with a classifier using theHO2 features using the positive and negative training samples;extracting a second set of entities from a plurality of images;computing HO2 features of each of the second set of entities, wherein,for each HO2 feature computation, a different one of the second set ofentities is chosen as the reference entity; and classifying each of thesecond set of entities with the classifier to determine whether theevent has occurred in the plurality of images. The classifier may be asupport vector machine (SVM).

According to an embodiment of the present invention, the method mayfurther comprise the steps of: computing HO2 features for an entity ofinterest from the plurality of entities over a sliding window of time toform a time sequence and clustering the time sequence using a clusteringalgorithm. The clustering algorithm may comprise constructing ahierarchical cluster tree using χ² distance and a using nearest neighborstrategy. The distance between two clusters is the smallest distancebetween objects in the two clusters. The clustering algorithm mayfurther comprise constructing clusters recursively from the root toleaves of the hierarchical cluster tree based on one of an inconsistencemeasure and the maximum number of clusters.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention may be more readily understood from the detaileddescription of an exemplary embodiment presented below considered inconjunction with the attached drawings and in which like referencenumerals refer to similar elements and in which:

FIG. 1 is a process flow diagram illustrating exemplary steps forcreating a histogram of oriented occurrences (HO2), according to anembodiment of the present invention;

FIG. 2 depicts instances of HO2 features for two simple multi-entityevents, specifically of two persons walking in different formations,according to an embodiment of the present invention;

FIG. 3 visually depicts of HO2 features using a log-polar partitionfunction;

FIG. 4 illustrates an example of part-based partition with respect to arectangular shaped building with one door;

FIG. 5 is a process flow diagram illustrating exemplary steps of asupervised multi-agent event detection method employing HO2 features,according to an embodiment of the present invention.

FIG. 6 is a process flow diagram illustrating exemplary steps of anunsupervised multi-agent event detection method employing HO2 features,according to an embodiment of the present invention;

FIG. 7 depicts a system for creating a histogram of oriented occurrences(HO2), according to an embodiment of the present invention;

FIG. 8 illustrates three action stages in an Unload3P clip andcorresponding HO2 features.

FIG. 9 illustrates a plot of HO2 features computed over overlappingwindows of 200 frames of an Unload3P clip and stacked over time;

FIG. 10 shows the temporal segmentation results of the Unload3P clipssegmented into three clusters using HO2 features;

FIG. 11 illustrates HO2 features of three multi-agent complex events;

FIG. 12 illustrates an ROC curve of VIP event recognition;

FIG. 13 illustrates an ROC curve of Mounting/Dismounting eventrecognition;

FIG. 14 illustrates an ROC curve of Loading/Unloading event recognition;and

FIG. 15 illustrates a screen shot of VIP detection and recognitionsystem.

It is to be understood that the attached drawings are for purposes ofillustrating the concepts of the invention and may not be to scale.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1 is a process flow diagram illustrating exemplary steps forcreating a histogram of oriented occurrences (HO2), according to anembodiment of the present invention. An HO2 aggregates the relativespace-time relationships amongst multiple objects that can be used as anintermediate feature to detect and classify complex multi-object events.At step 102, at least one image is received from at least one imagecapturing device, such as a video/still camera, which may be rigid ormoving. At step 104, a plurality of entities are detected and tracked inthe at least one image. According to an embodiment of the presentinvention, each of the plurality of entities may be an object or thelocation of the object in the at least one image. At step 106, one ofthe plurality of entities is designated as a reference entity, whichitself may be an entity or the location of an entity. According to anembodiment of the present invention, an HO2 captures the presence ofentities with respect to a reference entity or a reference location overspace and time. At step 108, given a reference entity, a local 2D groundplane coordinate system centered on and oriented with respect to thereference entity is defined. At step 110, the local 2D ground plane ispartitioned into a plurality of non-overlapping bins, the bins forming ahistogram, a bin tracking a number of occurrences of an entity class ofinterest. At step 112, the occurrence of at least one other entity ofthe plurality of entities located in the at least one image isassociated with one of the plurality of non-overlapping bins. As aresult, an HO2 is a histogram comprising a number of occurrences of eachentity class of interest in each bin. Typical entity classes of interestare people, vehicles, buildings and roads. The HO2 may be computed at atime instance or over a time interval. When computed over a timeinterval, the HO2 is the accumulated occurrences of entities over thetime interval. If the reference entity moves, such as a moving vehicleor person, the partitioning of the ground plane (or the bins) moves withthe reference entity. As a result, an HO2 captures entity interactionswithin the frame of reference of the reference entity. According to anembodiment of the present invention, in step 114, the number ofoccurrences of each entity class in the HO2 is loaded into a vector,thereby defining an HO2 feature (vector). The entities tracked in eachbin may move in various directions over a time interval.

FIG. 2 depicts instances of HO2 features for two simple multi-entityevents, specifically of two persons walking in different formations,according to an embodiment of the present invention. FIG. 2 aillustrates HO2 features (vectors) of two persons walking with onefollowing the other (i.e., in single file) and FIG. 2 b illustrates HO2features of two persons walking side by side (i.e., in echelonformation), each of the HO2 features of FIGS. 2 a and 2 b generatingdifferent spatial patterns. The 0 degree corresponds to the walkingdirection of the reference person. Therefore, a person walking behindthe reference person corresponds to a cluster at 180 degrees in FIG. 2 aand a person walking to the left of the reference person corresponds toa cluster at 90 degrees in FIG. 2 b. An HO2 feature is robust to clutteror noise in the real world scenarios. An HO2 feature is invariant totranslation. Through normalization with respect to the reference entity,an HO2 feature is also invariant to scaling and rotation. An HO2 featuremay be used by standard statistical clustering and classificationtechniques for complex event detection and recognition.

More formally, in its most generalized form, an HO2 is a histogram ofoccurrences of entity classes of interest over a partition of aspatial-temporal volume with respect to a reference entity or areference location. That is, given a partition function, P and areference entity R, the 2D ground plane Ω is partitioned into Nnon-overlapping regions or bins, Ω_(i)(R), such that, P(Ω, R)={Ω₀(R), .. . , Ω_(N)(R)}, where

$\Omega = {\overset{N}{\bigcup\limits_{i = 1}}{\Omega_{i}(R)}}$

and Ω_(i)(R)∩Ω_(j)(R)=Ø, if i≠j. Also, given a set of entity classes,E={e₀, . . . , e_(m)}, the HO2 feature f(I,R,T) for video clip I,reference entity R and time interval T, is an N×M dimensional vector:

f(I,R,T)={l _(i,j,k) ;i=1, . . . ,N;j=1, . . . ,M;k=1, . . . ,K}

where l_(i,j,k) is the number of occurrences of entities of class e_(j)in Ω_(i)(R) that is moving toward the direction in the bin k over thetime interval T. Note that the bins or the partition of the space is afunction of the reference entity R. When R moves, the bins move with it.

An HO2 feature is associated with three properties that can be optimizedfor different applications depending on the entity: (1) a partitionfunction for defining bins used in histogram calculation; (2) a spatialand temporal support and (3) entity categories on which the histogramsare computed.

The partition function of an HO2, P(Ω, R), defines the bins, Ω_(i)(R),used to compute the histogram of entities of interest. In a preferredembodiment, an HO2 preferably uses a log-polar partition, a uniformpartition in the log-polar coordinate system, as used in a shape contextfeature. The log-polar partition is most appropriate when the referenceentity can be modeled as an oriented point (a vector), such as when thereference entities are people and vehicles. However, it may not beadequate for large reference entities, such as building or roads.

An example of depicting HO2 features using log-polar partition functionis shown in FIG. 3. The scenario shows a moving convoy, for example, avehicle 302, a vehicle 304, and a pedestrian 306 crossing the street inFIG. 3A. The reference entity 303 is the middle vehicle in athree-vehicle convoy. The coordinate system used to compute an HO2 isdefined by the reference vehicle 303. The X-direction is the directionof the vehicle movement as shown by the car icon at the origin. Thevehicles 302, 304 follow in front of and behind the reference vehicle303, respectively. Thus, two entity classes in this example are vehicleand people. The coordinate system is determined by the reference vehicle303 shown at the origin. The x-axis corresponds to the heading of thereference vehicle 303. As illustrated in FIG. 3, the histograms ofvehicle and people occurrences are shown in FIG. 3B and the resultingHO2 feature vector is shown in FIG. 3C. In this example, quantizationaccording to the direction of movement is not shown.

When parts of the reference entity are important or the reference entitycannot be treated as a point, a parts-based partition may be used. Aparts-based partition measures a distance to the reference entity as theshortest distance to a point on the boundary of the reference entity andthe bins are defined based on the important parts of the referenceentity. Although parts-based partitions are often defined manually, dueto the small number of entity categories and the small number of partsfor each entity category that are of interest, a parts-based partitionfunction can be defined for each entity category beforehand and used forall events thereafter. FIG. 4 illustrates an example of a part-basedpartition with respect to a rectangular shaped building 402 with onedoor 404. A partition function may be specified for an entity class,such as the building 402, where the partition adapts to the entity classand components of the class, such as doors 404.

A second property associated with an HO2 feature, spatial and temporalsupport, is represented by L and T, respectively. L is the maximaldistance to the reference entity/location. Entities located further thanL are not counted in an HO2 histogram. T is the duration of a temporalprocessing window.

A third property associated with an HO2 feature is entity categoriesthat are important for events in surveillance and training applicationssuch as people, vehicles, buildings, rooms and roads. According to anembodiment of the present invention, both vehicles and people may beused as reference entities, however, for simplicity, HO2 calculationsmay be limited to a people class when all the events in a video clip arepeople centric.

HO2 features may be used for different aspects of multi-agent eventanalysis. An HO2 feature may be directly used for event detectionthrough supervised training, which is described hereinbelow withreference to FIG. 5. HO2 features may also be used in an unsupervisedmode for temporal segmentation or clustering of entity tracks for eventrepresentation and recognition to be described with reference to FIG. 6.

FIG. 5 is a process flow diagram illustrating exemplary steps 500 of asupervised multi-agent event detection method employing HO2 features,according to an embodiment of the present invention. Incoming videoimages are initially pre-processed for entity detection and tracking(extraction) at step 502. Pre-processing may be implemented by any of anumber of existing methods. According to an embodiment of the presentinvention, pre-processing may be implemented by classifying the entitiesextracted from the incoming video images into broad categories such aspeople, vehicles, buildings and roads. At step 504, the extractedentities and their tracks are geo-referenced to the ground plane. Thisis achieved either through a manual calibration of the camera system orautomated geo-registration using ortho-rectified reference imagery.

To recognize an event, annotated samples of the event are used as atraining set. In a training sequence, there are a number of entities. Atstep 506, each entity is annotated as a correct (positive) referenceentity or an incorrect (negative) reference entity (clutter). At step507, for each annotated entity, an HO2 feature is calculated using theannotated entity as a reference entity. At step 508, the calculated HO2features with positive entities used as the reference entity areselected as positive training samples and the remainder selected asnegative training samples. At step 510, event samples are classifiedwith a classifier using the HO2 features. In one preferred method ofclassification, a support vector machine (SVM) is built using the aboveselected positive and negative training samples.

For a new video clip (i.e., a plurality of images), at step 512,entities are extracted. At step 514, HO2 features are computed for allentities extracted from the new video clip, wherein, for each HO2feature computation, a different one of the entities is chosen as thereference entity. At step 516, the computed HO2 features are thenclassified using an SVM to recognize the given event (i.e, whether ornot the event has occurred in the new video clip).

FIG. 6 is a process flow diagram illustrating exemplary steps 600 of anunsupervised multi-agent event detection method employing HO2 features,according to an embodiment of the present invention. In an unsupervisedmode, the goal is to use temporal segmentation or clustering topartition a complex event temporally into non-overlapping timeintervals. Each of the intervals comprises a single sub-event. Thus, instep 602, a set of incoming video images is initially pre-processed forentity detection and tracking. For an entity of interest e, such as aperson, a vehicle, or a building, HO2 features are computed at step 604using e as the reference entity. The aforementioned HO2 features arecomputed over a sliding window [t−Δt,t+Δt] to form a time sequencef(I,e,[t−Δt,t+Δt]). The time sequence f(I,e,[t−Δt,t+Δt]) may beclustered at step 606 using any clustering algorithm. According to anembodiment of the present invention, a hierarchical cluster tree may beconstructed as the clustering algorithm using χ² distance and a nearestneighbor strategy. Using this approach, the distance between twoclusters is the smallest distance between objects in the two clusters.For cluster r size of n_(r) and cluster s size of n_(s), the nearestneighbor distance d_(NN)(r,s) between r and s is:d_(NN)(r,s)=min(dist(x_(r) _(i) , x_(s) _(j) )), iε(1, . . . , n_(r)),jε(1, . . . , n_(s)).

Once the hierarchical cluster tree is built, at step 608, clusters areconstructed recursively from the root to leaves based on aninconsistence measure or the maximum number of clusters.

FIG. 7 depicts a system for creating a histogram of oriented occurrences(HO2), according to an embodiment of the present invention. By way of anon-limiting example, the system 710 receives digitized video or stillimages from one or more image capturing devices 712, such as one or morestill or video cameras. The system 710 may also include a digital videocapture system 714 and a computing platform 716. The digital videocapturing system 714 processes streams of digital video, or convertsanalog video to digital video, to a form which can be processed by thecomputing platform 716. The digital video capturing system 714 may bestand-alone hardware, or cards such as Firewire cards which can plug-indirectly to the computing platform 716. According to an embodiment ofthe present invention, the image capturing devices 712 may interfacewith the video capturing system 714/computing platform 716 over aheterogeneous datalink, such as a radio link (e.g, between an aircraftand a ground station) and digital data link (e.g, ethernet, between theground station and the computing platform 716). The computing platform716 may include a personal computer or work-station (e.g., a Pentium-M1.8 GHz PC-104 or higher) comprising one or more processors 720 whichincludes a bus system 722 which is fed by video data streams 724 via theone or more processors 720 or directly to a computer-readable medium726. The computer readable medium 726 may also be used for storing theinstructions of the system 710 to be executed by the one or moreprocessors 720, including an operating system, such as the Windows orthe Linux operating system. The computer readable medium 726 may furtherbe used for the storing and retrieval of video clips of the presentinvention in one or more databases. The computer readable medium 726 mayinclude a combination of volatile memory, such as RAM memory, andnon-volatile memory, such as flash memory, optical disk(s), and/or harddisk(s). Portions of a processed video data stream 728 may be storedtemporarily in the computer readable medium 726 for later output to amonitor 730. The monitor 730 may display processed video datastream/still images. The monitor 730 may be equipped with a keyboard 732and a mouse 734 for selecting objects of interest by an analyst.

Testing scenarios employing HO2 features are described hereinbelow. Twotesting sets are used in the testing scenarios. A first set is Groundsurveillance data set. This data set contains videos collected in aparking lot with staged events and normal background activities. Theresolution of the data set is approximately 3-4 cm/pixel. There are 47video sequences and 143 noise-free people tracks in the data set. Forthe first set, 5% Gaussian white noise is added.

A second set is training exercise data set. This data set contains videocollected during room clearing training exercises using camerasinstalled on ceilings. The resolution is approximately 3 cm/pixel. Thevideo is captured for trainee performance evaluation andafter-action-review. The tracks are generated using radio frequencyidentification (RFID) based triangulation. There are significant errorsin the tracks due to multi-path interference.

The first example testing scenario uses HO2 features to segment a videosequence into different stages. FIGS. 8, 9 and 10 illustrate arepresentative sequence, named Unload3P, and the corresponding results.In FIG. 8, there are three video frames representing each of the threeaction stages in the Unload 3P clip and their corresponding HO2features. HO2 features are computed from the three frames using the caras the oriented reference entity. In this sequence, a first person walksout of building towards his/her car as shown in FIG. 8 a. Then, twoadditional people exit a car and unload a box from the trunk as shown inFIG. 8 b. Next, the two people carry the box and walk towards thebuilding as the first person continues walking towards his car in FIG. 8c.

FIG. 9 shows a plot of HO2 features computed over overlapping windows of200 frames of the Unload3P clip and stacked over time, where the X-axisis the frame number and the Y-axis shows the 64 dimensions of the HO2feature vector. The values of the components of the HO2 features areshown using the “hot” color map, wherein the larger the value, thehotter the color. The three segments corresponding to the three actionstages in the video sequence are shown as walking towards a car,unloading a box from trunk, and then walking away carrying a box.

The HO2 features are then clustered according to the χ² distance. Fromthe clusters, the video clip is segmented into three stages as shown inFIG. 10, which matches the ground truth. FIG. 10 shows the temporalsegmentation results of the Unload3P clips segmented into three clustersusing HO2 features.

In the second example, the HO2 features are extracted from the wholesequences in the surveillance data set. Then, the HO2 features are usedto recognize three events, Loading/Unloading, Mounting/Dismounting andVIP Arrival/Departure. FIG. 11 shows HO2 features of three multi-agentcomplex events: Non-VIP arrival/departure, VIP arrival/departure, andLoading/Unloading events. FIG. 11 shows that the same events havesimilar HO2 signatures and different events have different patterns. TheHO2 features corresponding to the same event have similar patterns whileHO2 features corresponding to different events are quite distinct.

In this example, a support vector machine (SVM) is used as theclassifier. For VIP recognition, an HO2 with respect to people iscomputed. Therefore, one HO2 feature for each person or each peopletrack is computed. Then, the HO2 features are classified into twoclasses: VIP and Non-VIP using the SVM. FIG. 12 shows an ROC curve ofVIP event detection/recognition, where “TPR” is a true positive rate and“FPR” is a false positive rate. As shown in FIG. 12, an overallclassification accuracy of 96.92% is achieved.

For Loading/Unloading and Mounting/Dismounting events, HO2 features withrespect to vehicle are computed. Then, the HO2 features are classifiedusing an SVM. The classification accuracy of Mounting/Dismounting vs.everything else is 96.15% and an ROC curve of the classification resultsis shown in FIG. 13. FIG. 13 illustrates the ROC curve ofMounting/Dismounting event recognition, where “TPR” is a true positiverate and “FPR” is a false positive rate. The classification accuracy ofLoading/Unloading vs. everything else is 98.46% and an ROC curve isshown in FIG. 14. FIG. 14 shows the ROC curve of Loading/Unloading eventrecognition, where “TPR” is a true positive rate and “FPR” is a falsepositive rate.

Additionally, a VIP detection and recognition system has been developed.The inputs of the system are vehicle and people tracks. The outputs ofthe system are normalized VIP scores for each person. A screen shot ofthe VIP detection and recognition system is shown in FIG. 15, where theinput video, people and vehicle tracks and VIP scores for each personare shown. FIG. 15 a shows the input video with people marked indifferent colored bounding boxes. The upper line in FIG. 15 c displaysthe response of a VIP (corresponding to the lower right box in FIG. 15 aand middle line in FIG. 15 b), and the other lines in FIG. 15 crepresent Non-VIPs (other boxes and lines in FIGS. 15 a and 15 b,respectively). The response from a VIP is much higher than theNon-VIP's. FIG. 15 b shows vehicle and people tracks and FIG. 15 cillustrates VIP scores vs. frame number for each person. The plot ofFIG. 15 c shows the significant differences in responses of a VIP vs. aNon-VIP.

The examples discussed above provide good results in applying HO2features to a number of problem domains including temporal segmentationof an action sequence into stages, complex multi-agent event recognitionand behavior and event matching for automated performance evaluation inmilitary training.

It is to be understood that the exemplary embodiments are merelyillustrative of the invention and that many variations of theabove-described embodiments may be devised by one skilled in the artwithout departing from the scope of the invention. It is thereforeintended that all such variations be included within the scope of thefollowing claims and their equivalents.

1. A non-transitory computer-implemented method comprising: detectingand tracking one or more entities in one or more images; designating oneor more of the one or more entities as a reference entity; defining acoordinate system oriented correspondingly with the reference entity;partitioning a space defined by the coordinate system into a pluralityof bins; and counting, for each bin, a number of occurrences of anentity class.
 2. The method of claim 1 wherein the coordinate system isdefined as two or more dimensions for a ground plane and thepartitioning the space partitions the ground plane.
 3. The method ofclaim 1 wherein the counting creates a histogram of oriented occurrences(HO2) where occurrences comprise at least one or more attributes ofmotion of the entity.
 4. The method of claim 1, further comprising thestep of associating an occurrence of at least one other entity of theplurality of entities located in the one or more images with one of theplurality of non-overlapping bins.
 5. The method of claim 1 furthercomprising automatically detecting, based on the number of occurrences,one or more events associated with the reference entity.
 6. The methodof claim 1, further comprising associating an occurrence of at least oneother entity of the plurality of entities located in the at least oneimage with one of the plurality of non-overlapping bins.
 7. The methodof claim 2 wherein the partitioned ground plane moves when the referenceentity moves.
 8. The method of claim 1, wherein the partitioningcomprises employing a parts-based partition, wherein the parts-basedpartition measures a distance to the reference entity as the shortestdistance to a point on the boundary of the reference entity and the binsare defined based on parts of the reference entity.
 9. The method ofclaim 3, further comprising loading a number of occurrences of entitiesof at least one entity class in at least one of the bins into a vectorto define an HO2 feature.
 10. The method of claim 9, further comprisingthe steps of: computing HO2 features for an entity of interest from theplurality of entities over a sliding window of time to form a timesequence; and clustering the time sequence using a clustering algorithm.11. A non-transitory computer-implemented method for recognizing anevent, comprising: detecting and tracking a first set of entities in afirst plurality of images; geo-referencing the first set of entities toa defined coordinate system; designating each member of the first set ofentities as a positive or negative sample with respect to the event;counting a number of occurrences for each member of the first set ofentities within a related region of the coordinate system; building aclassifier for the event based on the occurrence counts for the positiveand negative samples; and using the classifier to determine whether theevent occurs in a second plurality of images.
 12. The method of claim 11wherein the using the classifier further comprises: detecting andtracking a second set of entities in a second plurality of images; andcounting a number of occurrences for each member of the second set ofentities within a related region of the coordinate system; and applyingthe classifier for the event based on the occurrence counts for thesecond set of entities.
 13. The method of claim 11 wherein the countingcreates a histogram of oriented occurrences (HO2) feature for eachmember of the first set of entities, the building a classifier is basedon the HO2 feature, and the occurrences comprise at least one or moreattributes of motion of the entity.
 14. The method of claim 13, whereinthe classifier is a support vector machine (SVM).
 15. The method ofclaim 14, wherein the SVM is built using the HO2 features ofparticipants as positive samples and non-participants as negativesamples:
 16. The method of claim 15, wherein the clustering algorithmcomprises constructing a hierarchical cluster tree using χ2 distance anda using nearest neighbor strategy.
 17. The method of claim 16, whereinthe distance between two clusters is the smallest distance betweenobjects in the two clusters.
 18. The method of claim 17, wherein theclustering algorithm further comprises constructing clusters recursivelyfrom the root to leaves of the hierarchical cluster tree based on one ofan inconsistence measure and the maximum number of clusters.
 19. Themethod of claim 18, further comprising loading a number of occurrencesof entities of at least one entity class in at least one of the binsinto a vector to define an HO2 features.
 20. The method of claim 19,further comprising the steps of: computing HO2 features for an entity ofinterest from the plurality of entities over a sliding window of time toform a time sequence; and clustering the time sequence using aclustering algorithm.